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Abstract 

Consider the ordinal optimization problem of finding a population amongst many 
with the smallest mean when these means are unknown but population samples can be 
generated via simulation. Typically, by selecting a population with the smallest sam¬ 
ple mean, it can be shown that the false selection probability decays at an exponential 
rate. Lately researchers have sought algorithms that guarantee that this probability 
is restricted to a small 5 in order log(l/<f) computational time by estimating the as¬ 
sociated large deviations rate function via simulation. We show that such guarantees 
are misleading. Enroute, we identify the large deviations principle followed by the 
empirically estimated large deviations rate function that may also be of independent 
interest. Further, we show a negative result that when populations have unbounded 
support, any policy that asymptotically identifies the correct population with proba¬ 
bility at least 1 — 5 for each problem instance requires more than 0(log(l/h)) samples 
in making such a determination in any problem instance. This suggests that some 
restrictions are essential on populations to devise 0(log(l/<5)) algorithms with 1 — 5 
correctness guarantees. We note that under restriction on population moments, such 
methods are easily designed. We also observe that sequential methods from stochastic 
multi-armed bandit literature can be adapted to devise such algorithms. 


1 Introduction 

Suppose that we can sample independently from d different random variables or ‘populations’ 
(X(i) : i < d). Further, from each population i, we can generate independent identically 
distributed (iid) samples (X(i,j) : j > 1). The distribution of (V(i) : i < d) is not known 
and our aim is to find the ‘best’ population 

i* = arg min EX(j). 

i <j<d 



Ordinal optimization corresponds to observing that under light-tailed assumption on the 
distribution of {X{i) : i < d), ordinals are learnt correctly faster than the values of expecta¬ 
tions. (We say that a distribution is light-tailed if its moment generating function is finite 
in a neighbourhood of zero). More specifically, 

P(X n (i*) > min X n (j)) 

decays exponentially in n, where X n (i) = y 5^fc =1 X(i, k), while it is well known through 
central limit theorem that typically rate of convergence of X n (i) to EX(i) is n -1 ^ 2 . This 
suggests that for small but positive 5 , one may be able to construct algorithms that generate 
0(log(l/<5)) samples of X{i)'s and make a correct selection while restricting the probability 
of false selection to 5 . In this paper we critically examine this proposition, answering this in 
negative in a general light-tailed settings, and in positive when further information on mo¬ 
ments of the underlying random variables is available. Applications of ordinal optimisation 
in simulation arise in selecting a best design from a set of competing designs via simulation 
where all of the designs may be modelled as discrete event dynamic systems. Such systems 
include queueing systems, computer and communications networks, manufacturing systems 
and transportation networks (see, e.g., Ho, Srinivas and Vakili 1992 for some applications). 

1.1 Brief literature review 

A brief literature survey is in order: Ho et. al. (1992) observed that determining ordi¬ 
nals amongst population means is faster than estimating the means. Dai (1996) used large 
deviations to show in a fairly general framework that for light-tailed random variables the 
probability of false selection decays exponentially. Chen, Lin, Yucesan and Chick (2000) 
considered the problem of ordinal optimization under the assumption that a fixed but large 
computation budget n is available and the underlying random variables have a Gaussian 
distribution. They attempted to optimize the budget allocated to each population asymp¬ 
totically asn->oo so that the probability of false selection is minimized. Glynn and Juneja 
(2004) observed that for Pi > 0, 

P(Xi*(pi*n) > m\nX i {p i n)) < e ~ nH (pu->Pd)^ ( 1 ) 

and lim, woo ±P(Xi*(pi*n) > nrin^* X^ppn)) = ~H(p u ... ,p d ). They, then optimised this 
large deviations rate function H(pi,... ,p d ) under the constraint Yli=iPi = 1 to determine 
the optimal allocations as n —> oo even for non-Gausssian distributions. Significant literature 
since then has appeared that relies on large deviations analysis (e.g., Hunter and Pasupathy 
2013, Szechtman and Yucesan 2008, Broadie, Han, Zeevi 2007, Blanchet, Liu, Zwart 2008). 
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Substantial literature exists on selecting the best system amongst many alternatives using 
ranking/selection procedures (see, e.g., Kim and Nelson 2001, 2003, Nelson et. al. 2001, 
Branke, Chick and Schmidt 2007 for an overview). Gaussian assumption is critical to most 
of the analysis here. These approaches also consider the ‘indifference-zone formulation’ where 
it is assumed that there exists a known e > 0 such that 

EX(i*)<EX(j)-e (2) 

for j ^ i* (see, e.g., Nelson and Matejcik 1995). Such an e is then useful in devising rules 
for the number of samples needed to control the probability of false selection to pre-specified 
levels. 

Ordinal optimisation methods are also related to the vast, elegant and evolving literature 
in learning and statistics community referred to as stochastic multi-armed bandit methods. 
Typically, this strand of literature refers to sampling from a population as ‘pulling an arm’ 
and assumes that each such pull leads to a random reward whose distribution depends upon 
the population. The aim then is to develop optimal or near optimal sequential sampling 
strategy that maximises the long term total expected reward or equivalently minimises the 
total expected regret (regret over n trials is referred to as the reward that would have 
been realised in these trials if the ‘best’ arm was pulled each time versus the actual reward 
realisation from a given sequential strategy - here best refers to the arm with the highest 
expected reward. See, e.g., Bubeck and Cesa-Bianchi 2012, Cesa-Bianchi and Lugosi 2006 
for surveys on these methods; Lai and Robbins 1985 for a seminal paper in this area). 
Optimal sequential strategies have to carefully manage the exploration-exploitation trade¬ 
offs in arm selections. Recently, these methods have also been used in the pure exploration 
setting where the goal is to identify the arm with the highest expected reward in minimum 
expected number of trials. See, e.g., Even-Dar, Mannor and Mansour (2002, 2006), Audibert 
and Bubeck (2010), Jamieson, K., Malloy, M., Nowak, R., & Bubeck, S. (2013). Thus, this 
problem is identical to the ordinal optimisation problem that we consider. A standard 
assumption both in minimising regret and the pure exploration settings is that the rewards 
from each arm are either Bernoulli or are bounded with known bounds. In a recent paper 
Bubeck, Cesa-Bianchi and Lugosi (2013), again consider the problem of minimising expected 
regret under the assumption that the rewards are unbounded but explicit bounds on their 
moments are known. 

1.2 Observations 

Observe that if, as in (JT|) , P(FS) < e n/ , for some / > 0, then 

n = j log(l/J)) ensures P(FS ) < 6. 
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However this relies on estimating / from the samples generated. One hopes for algorithms 
that for n = 0(log(l/<5)) ensure that at least asymptotically P(FS ) < 5, that is, 

limsup P(FS)8~ 1 < 1 (3) 

even when an indifference zone formulation is considered so that there exists a fixed and 
known e > 0 and (j2]) holds. 

Note that 0(log(l/h)) effort is necessary to achieve ({3j in the sense that if log(l/h) 1_e 
samples are generated, for any small e > 0, then 

positive no. 

P{Xi e H) log(1/ ^ £ = 5 > 5 


as 5 —> 0. 

Also observe that, 0(log(l/<5) 1+e ) is sufficient as 

8~ 1 P(FS) < 5~ l e~ nI = (5 _1 e _Iog ^ 1 /^ 1+ ' !/ = 5 1 °g( 1 / ,5 ) E/_1 
which goes to zero as 5 —* 0. 

1.3 Our contributions 

We first consider two practically reasonable implementations that involve estimating the large 
deviations rate function / associated with the probability of false selection from 0(log(l/5)) 
generated samples and using this estimator as a proxy for I to control P(FS). We argue 
that there exist light tailed distributions for which such methods would fail. Enroute, we 
conduct large deviations analysis of the empirically estimated large deviations rate function. 
This is useful to our analysis and may also be of independent interest. 

Our key result is negative and is as follows - Given any (e, 5) algorithm that correctly 
separates populations with mean difference at least e with 

limsup P(FS)5~ 1 < 1, 

5—>0 

we prove that for populations with unbounded support, under mild restrictions, the expected 
number of samples cannot be 0(log(l/h)). This result also holds for restrictive (e, 5) policies 
where P(FS) < 6 for any given <5, not just asymptotically. Further, similar results also hold 
when the criteria for selecting the best design may not be the population mean but another 
function of the population distribution such as its specific quantile. 

Our positive contributions - Under explicitly available upper bounds on convex, in¬ 
creasing functions of underlying random variables, we develop random variable truncation 
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as well as capping based 0(log(l/<5)) computation time (e,<5) algorithms. We also observe 
that the recently proposed sequential algorithms in multi-armed bandit regret setting (see 
Bubeck et. al. 2013) when heavy tails are involved, are easily adapted to this pure ex¬ 
ploration setting to provide 0(log(l/<5)) computation time sequential algorithms. We also 
develop upper bounds on computational effort under these algorithms and suggest tweaks 
that may lead to minor performance improvements. 

1.4 Roadmap 

In Section 2 we review some basic large deviations results and conduct large deviations 
analysis of the empirical large deviations estimator. The results are useful to our analysis in 
Sections 3 and 4, where we discuss pitfalls of some standard approaches for selecting the best 
system that rely on empirical estimator of the large deviations rate function. Specifically, 
in Section 3, we consider a standard two phase approach to ordinal optimisation adapted to 
rely on estimating the large deviations rate function, and point out its drawbacks. In Section 
4, we consider a reasonable sequential approach that relies on estimated large deviations rate 
function and illustrate cases where it may fail. Section 5 contains our key negative result 
illustrating that, under mild regularity conditions, the impossibility of algorithms that run in 
0(log(l/<5)) time and control the probability of false selection to within small 5. In Section 
6 and 7 we provide some positive results. In Section 6, we develop 0(log(l/5)) algorithms 
when upper bounds on suitable moments of underlying random variables are available. In 
Section 7, we adapt recent results from multi-armed bandit related research to our ordinal 
optimisation setting. 

All the proofs are given in the appendix. 

Note that this is a rough draft prepared to facilitate early dissemination. Versions with 
unproved structure and fewer errors should (hopefully!) update this draft soon. 

2 Large deviations analysis of empirical large devia¬ 
tions estimator 

We first review the basic Cramer’s theorem that is critical to our analysis. Suppose Xi, X 2 ,..., X n 
are i.i.d. samples of X and a > EX. Let A (6) = log Ee ex denote the log moment generating 
function of X and let 

1(a) = sup (9a — A(9 )), 

denote its Legendre transform. Then it follows from Cramer’s theorem that 

P(X n > a) < exp (—n inf I(x)) = exp (-nl(a)). 

x>a 
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Furthermore, 


lim — log P(X n > a) — — 1(a). 

n—>oo Tl 

/(•) then denotes the the large deviations rate function of X. It is a nonnegative convex 
function that equals 0 at EX. 

As mentioned earlier, in Sections 3 and 4 we point out some drawbacks of using some 
standard approaches to control the probability of false selection that rely on estimating the 
large deviations rate function. To keep the discussion simple there and in this section, we 
consider the case where d = 2 and equal number of samples are allocated to each population. 
This case further simplifies to that of a single random variable X with an unknown mean 
EX and our interest is in establishing either EX > 0 or EX < 0 with error probability < 6, 
asymptotically as 5 —> 0. 

Note that this determination may be based on the sign of the sample average X n of n iid 
samples of X. Then, if EX < 0, we may take exp(—n/(0)) as a proxy for P(X n > 0), the 
probability of false selection. Similarly, if EX > 0, we may again take exp(— nl(0)) as a 
proxy for P(X n < 0), the probability of false selection. Thus, samples ensure that 

P(FS) < 6. Since J(0) is typically unknown, an estimator needs to be ascertained from the 
empirically generated samples. 

2.1 Empirical estimator for 7(0) 

Without loss of generality we assume that EX < 0 and that random variable X is not 
degenerate. Recall that A (6) denotes the log-moment generating function log E exp (9 A"). It 
is a strictly convex with A(0) = 0 and A'(0) = EX < 0. Furthermore, 

J(0) = -infA((9). 

A natural estimator for I (0) based on samples (X t : 1 < i < m) is 

Im{ 0) = - inf A m (0) 

6>e5R 

where 

exp(6AQ)^ 

We now conduct large deviations analysis of the estimator J m (0). Duffy and Metcalfe 
(2005) conduct large deviations analysis for the empirical rate function on the functional 
space. Our analysis focusses on the empirical rate function evaluated at a specified value 
and provides greater intuitive insight into the large deviations event. 
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Figure 1: Graphical description of the log-moment generating function A(-) and the large 
deviations rate function at zero 1(0) — — infg A(0). 



Figure 2: Examples of empirically estimated log-moment generating functions. One where 
the sample mean is negative, other where it is positive. 
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Reader more interested in the ordinal optimisation discussion may skip the discussion 
below and go directly to Section 3 at initial reading. Readers willing to ignore the large 
deviations analysis may go directly to Section 5 for our key negative result and to Sections 
6 and 7 for more positive conclusions. 

2.2 Large deviations analysis for I m ( 0) 

Theorem [l] below states a large deviations result for the empirically estimated large deviations 
rate function estimator. We need the following light-tailed assumption on the distribution 
of X. 

Assumption 1 For any K >0, there exists a u > 0 such that 

lim sup — logP(X m > u) < —K 

ra—l-oo W 


and 


lim sup —logP(X m < —u) < —K. 

m —>oo Tfl 


Theorem 1 Under Assumption^ 7j for a > 1(0), 


lim — logP(/ m (0) > a) = lim — logP ( inf — e eXi < e “ I = — infX e (e a ), (4) 

n._s.nn m m-s>oo 777 , \ 0e5R m *T~' I 0&R 


m^-oo xxi 

where 


Ie(v) = sup(az/ — logPexp(ae e ' Y )). 


Corollary 1 From the proof of Theorem^ 7] it is easily seen that for a > — inf^gyi A(6), 

lim — logP ( inf A m (6) < — a ) = — inf X e (e~ a ) 
m^-oo m \6£A J 6&A 


(5) 


for any closed interval A. 

2.2.1 P(/ m (0) < a) for 0 < a < 1(0 ) 

The large deviations upper bound for P(I m ( 0) < a), for 0 < a < 7(0), is easily seen. Let 
@a = {9 ■ A (9) < —a}. It is clearly a non-empty interval. Note that 

P(I m ( 0) < a) — P(inf A m (6) > —a) < inf P(A m (9 ) > —a) < inf exp (—m inf Tg(x)). 

6»e5R eeSR 0e5R x>e~ a 

Note that X e (e K ^) = 0. So 


lim sup — logP(J m (0) < a) < — sup X g (e a ) 

m—>00 ^ 0 G@a 


follows. 


The lower bound requires greater technicalities. To keeps the analysis simple we make 
Assumption [2] below. Let T>\ = {9 : A(9) < 00 }. 

Assumption 2 For 0 < a < 1(0), let Q a = [9 a , 0 a \ and Q a C T>\. 

Assumption |2] implies that 9_ a > 0 and A (9 a ) = A(9 a ) = —a. 

Below in Proposition [I] we consider the case where P(X > x ) for large x is bounded from 
below by a term exponentially decaying in x. Theorem [2] later considers the case where this 
is not true. 

Proposition 1 Suppose that Assumption^ holds, and 

P(X > x) > exp(— \x) 

for some A > 0 and all x sufficiently large. Then, 

lirn — log P(/ m (0) < a) = 0. (6) 

m—>00 777, 

Furthermore, 

sup le{e~ a ) = 0. (7) 

0ee a 


We now consider P(I m ( 0) < a), 0 < a < /(0), lg(e a ) > 0 for 6 G (9 a ,9 a ). Specifically, 
we require Assumption [3] below for further analysis. Let 

Ft = {(ot,6) : Eexp(ae ex ) < 00 }. 

Assumption 3 For 0 < a < 1(0), there exist positive (a*, 6*) G FL° that uniquely maximize 

f(a,9 ) = ae~ a — logF exp(ae dx ). (8) 

Remark 1 It is easy to check that (a*, 6*) satisfy the following first order conditions for 
maximising f(a,9), 



_ a Ee e * x exp(a*e e * x ) 

Eexp(a*e e * x ) 

(9) 

and 

EXe e * x exp(a*e 6 * x ) = 0. 

(10) 

Also note that for 9 E 0°, 



H 

1 

P 

) = a(9)e~ a — logF[exp(a(6 , )e 0X )], 
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where a (9) uniquely solves 


__ a Ee ox exp(ae e,A ) 

Eexp(ae ex ) 

Furthermore, differentiating it and setting the derivative equal to zero, it can be seen that 

sup lg(e~ a ) = lg*(e ~ a ) = a*e~ a — logE[exp(a*e 6, * x )]. 
flS0 a 

Theorem 2 Under Assumptions^ a?zd[3| for 0 < a < 7(0), 

lim — logP(7 m (0) < a) = -Z g *(e~ a ) = - sup lg(e~ a ). (11) 

rrwoc rn g € 0 a 

Remark 2 Large deviations analysis for empirical large deviations rate estimator of I(x) 
follows analogously. Note that I(x) is just the rate function for X — x evaluated at zero. A 
natural estimator for I(x) is 


I m (x) = SUp (Ox - A m {0)). 
flea ft 

Furthermore, under similar technical conditions, for a > I(x), 

1 1 ( 1 m \ 
lim — log P(i m (x) > a), = lim — logP I inf — e e ( x i~ x ) < e ~ a ] — _ j n f x e (e 

m-yoom 6 \ my j _ ), rn ^^ irri 6 ~ j 


-a-\-9x\ 


m—too Vfl 


i =1 


( 12 ) 


Further, for a < I(x) 


lim —log P(I m (x)<a) = - sup l e (e a+0x ), 


m—yoo 771 


(13) 


Sell, A 


where now 0 < 9 a < 6 a are such that A (9 a ) = —a + 6_ a x and A [6 a ) = —a + 9 a x. 


3 Two phase ordinal optimisation implementation 

In this section we consider a two phase procedure to determine whether EX > 0 or < 0. 
In the first phase, using 0(log(l/<5)) samples, 7(0) is estimated. In the second phase, this 
estimator is used as a proxy for /(0) in deciding the number of samples to generate. We also 
illustrate the poor performance of this approach in some settings. 

The specific two phase procedure is as follows: 

• First phase - Generate m = |~Ci log(l/<5)] independent samples of X to estimate 7(0) 
by I m { 0) for some c\ > 0. 
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• Second phase - Generate 

|"c 2 ci log(l/ 8)/ / m (0)] = \c 2 m/i m ( 0)1 = N 
independent samples of X for some c 2 > 0. 

• Decide the sign of EX based on whether X^ > 0 or X^ < 0. 

Remark 3 In the above procedure, 

P(FS) = P(ff X i > 0) « Bexp ■ 

Thus, large values of J m (0) lead to undersampling in second phase and contribute the most 
to P(FS). Note that individual X's taking large values do not contribute significantly to 
the estimator 

U 0 ) = -inf 1 °g(ig ex p(»X i )) 

taking large values. This may appear counter-intuitive when P(X > x) ~ exp(— Xx) for 
large x, A > 0, as then exp(6Xj) are heavy-tailed (see, e.g., Foss, Korshunov and Zachary 
2011 for an introduction to heavy-tailed distributions) and it is well known that large de¬ 
viations of P exp(0Xj), i.e., P YlT=i exp(#Xj) taking an unusually large value, is gov¬ 
erned by the largest term in the sum. However, J m (0) takes unusually large values when 
infeed — Y^i =l exp (6Xi) takes unusually small values, and here large values taken by individ¬ 
ual X[s have little impact. 

Theorem 3 For N = [m// m (0)~|, under Assumption 1, 

lim inf — log P{S~' X t > 0) > — inf (+ inf Z g (e~ b ) \ . (14) 

m m ^ b>o V b e / 

i =1 x 7 

In particular, for Ci = c 2 = 1, 

N 

lim inf P(^ X t > 0)5" 1 > 1. (15) 

1=1 

Remark 4 Consider the term 

Ci inf (ci ~— + inf Z e (e _7 )| . (16) 

7>o V 7 e ) 

It is easy to see that since 

= sup (—ae -7 — logFiexp(— ae ex )) , 

«e5R 
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( 17 ) 


then the solutions 6 and a to inf 0 X 6 )(e 7 ) uniquely solve the equations 

Ee ex exp(— ae ex ) 


e -7 = 


Eexp(— ae ex ) 


and 


EXe ex exp(-ae 0A ) = 0. (18) 

Combining this in (16), it can be seen that ( 7 ,6, a) that optimise (16), uniquely solve 0 , 
(18) and 

ae " 7 = /( 0 )/ 7 2 . (19) 

Now consider a rv X that takes values —b and b, b > 0, with probability p and 1 — p, 


respectively with p > 0.5. Clearly then EX = — (2p — 1 )b < 0. Solving equations (17, 18 


19) it can be seen that RHS in (14) is independent of value b - hence any indifference zone 


considerations can be met by scaling b no matter how close p is to 0.5. 


For ci,c 2 = 1, and p = 055,0.52,0.51, RHS in (14) equals 0.105, 0.047 and 0.025, respec¬ 


tively. It can be seen that as p l 0.5, k, 6 and RHS in (14) decrease to zero, and a —> 1. 


4 Sequential ordinal optimisation implementation 

We now critique a natural sequential procedure for deciding whether EX > 0 or < 0 that 
relies on large deviations rate function estimator I m { 0) for a stopping rule and that uses 
0(log(l/5)) computational effort and attempts to control the probability of false selection 
to within 5 for small 5. Further, we identify some light-tailed distributions for which the 
algorithm behaves poorly. (See, e.g., Branke, Chick and Schimdt 2007, Goldsman et al. 
2002 for sequential procedures used under Gaussian assumption in ranking and selection 
simulation literature). 

Again, without loss of generality we assume that EX < 0. Recall that in this case 
exp(— ml m (0)) is a reasonable proxy for the probability of false selection. 

Consider the following procedure: 

• For Ci > 0, generate mi = Cilog(l/5) independent samples (X t : i < mi) of X in the 
first phase to estimate 7(0) by I mi ( 0). 

• If exp(—mi/ mi (0)) < 5, terminate, and conclude that the sign of EX is given by the 
sign of X mi . 

• Else, generate another C2log(l/5) independent samples of A", where c 2 may be deter¬ 
mined adaptively based on the outcome of first mi samples. Set m 2 = (ci + c 2 ) log(l/£) 
and again terminate if exp(—m 2 / m2 (0)) < 5. 
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Figure 3: This figure illustrates an empirical log-moment-generating function that would 
lead to a wrong conclusion (i.e., EX > 0) in the sequential algorithm. 

• This procedure may then be repeated with C 3 , c 4 and so on until the termination criteria 
is met. 

We now identify distributions for which this would wrongly terminate in the first phase 
itself even for large values of c \. Recall that FS denotes the event that the proposed algorithm 
concludes falsely. 

In such a case, we clearly have 7(0) < l/c\. Else, 

P(X m 1 > 0) < exp(—mi/(0)) < 5. 

I 11 particular, we have 0 > infgA(0) > —\/c\. 

Proposition 2 Suppose that there exists a 9 < 0 such that 

Te(e~ 1/Cl ) < 1/ci. (20) 

Then, 

liminf d~ 1 P(FS) = 00 . (21) 

( 5 —>00 
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Remark 5 To see numerical examples where (20) holds, consider a random variable X = 
K — Y where K is a constant, Y is exponentially distributed with rate A such that \K < 1 
so that EX < 0. 

Note that for 9 > 0, v < e A ^ e \ hence, 

= sup (—av — logA[exp(—ctexp(— 9K ) exp(0Y))]). 

a>0 

This in turn can be seen to equal 

sup (-aexp(OK)v — log£'[exp(—aexp(dY))]). 

o>0 

Let Z{9) = exp(dY). Then, it is easily seen that the density function of Z(9) at z > 1 
equals 


A 

e z 


-(A/0+1) 


Using this, we get 


A r°° 

X- 0 (e" 1/ci ) = -a* exp(9K - 1/ Cl ) - log - / exp (-a* z)z~ {x/e+1) dz, 

9 . \ 


where a* solves the equation 


exp {OK — 1/Ci) = 


exp(-az)z-V e dz 


( 22 ) 


exp(-az)z~^/ e+1 ' ) dz' 

Now fixing A = 1, K = 0.96, we numerically see that for c\ — 2, 9 — 2.133, a* = 0.0607 
solves (g with X^^e" 1 / 01 ) = 0.2231 < 1/ci. For Cl = 5, 9 = 0.987, a* = 0.201 solves g 
with X_ 0 (e _1/,Cl ) = 0.1259 < 1/ci. For Cl = 100, 9 = 0.129 a* = 1.1792, solves g with 
X_ e (e- 1 / C1 ) = 0.005425. 


5 The key negative result 

In this section we consider for simplicity a two population situation (the case d > 2 popula¬ 
tions is easily handled) and assume that random outputs from both populations belong to 
jC, where C denotes any collection of distribution functions G with finite mean. 

Consider a policy V(e,5) operating on distributions in C. By this we mean that given 
F,Gef such that the absolute value of the difference of their mean values exceeds e > 0 , 
the policy V(e,S), generates samples from the two distributions, adaptively (based on the 
values of generated samples) deciding which distribution to sample from next and at some 
stage (at a stopping time) selects one of the two populations as the one with the lower mean. 
The policy guarantees that the probability of false selection P(FS ) < 6. In this section 
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we consider a bigger set of policies V aS ymp(^,S) that differ from V(e,S) in that they only 
guarantee that P(FS) is asymptotically bounded from above by 5. That is, 

limsup P(FS)8~ 1 < 1. 

5—>0 

In many applications, it may be difficult to design a V(e,6) policy, however, designing a 
Pasym P (e, $) policy may be easier, and often a practitioner may be satisfied with asymptotic 
guarantees. (This is analogous to accepting central limit theorem based confidence intervals 
for population means where the coverage guarantees are valid only asymptotically, see, e.g., 
Glynn and Whitt 1992). Note that V(e,8) policies are a subset of V a symp(.£,8). 

5.1 Analysis 

Recall that the Kullback-Leibler distance between distributions G and G is defined as 

I(G,G) = 

Suppose that sequences of random variables (X l : i > 1) and (Y) : i > 1) defined on a 
probability space are i.i.d. and independent of each other under probability measures P a and 

Pb- 

Under probability measure P a , 

• distribution of X t is F with mean (ip i 

• that of Y, is G with mean /iq, where no < ~ e (e > 0). 

Under P b , 

• the distribution of X t is F, 

• that of Yj is G , where under G, Hq > Uf + e, and G and G are absolutely continuous 
w.r.t. each other (denoted by G rs_/ G). 

• Further, the Kullback-Leibler distance I(G,G) < oo. 

The policy V asymp (e, 5) operating on the populations (X l : i > 1) and (Y) : i > 1), for every 
5 > 0, adaptively generates samples (X, : i < Ti(5)) and (U : i < T 2 (S )) before concluding 
rv J(S) = 1, if {Wj} is deemed to have lower mean, else J(S) = 2. Let E a (E b ) denote the 
expectation operator under P a (. P b ). 

Linder P a , policy V asymp (e, 5) will erroneously conclude that distribution F has the lower 
mean with error probability P a (J(8) = 1) such that 

limsup P a {J{8) = 1)<5 -1 < 1. 

<5—>-0 
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Under P& it will erroneously conclude that G has the lower mean with error probability 
satisfying the relation 

lim sup Pb(J(8) = 2 )<5 -1 < 1. 

<5-5-0 


The proof of Lemma [I] is similar in spirit to analogous results in Lai and Robbins (1985) 
and Tsitsiklis and Mannor (2002). 


Lemma 1 Under P asymp (e, 5) operating on £, 


, KT 2 (6) ^ 1 

inn lilt:;— — 7 .. > sup -—. 

w log(l/{) - a e c„>w-H.a~a 3 1(G,G) 


(23) 


Lemma [2] illustrates that if G has unbounded support on the positive real line, that is, 
G(x) < 1 for all x G 3ft, then one can always find a G with mean arbitrarily high so that 
I(G,G) is arbitrarily small. 


Lemma 2 Given G with finite mean pc an d unbounded support on the positive real line, 
for any a > 0, and k > /iq there exists a distribution Gk G £, such that the Kullback-Leibler 
distance between G and Gk, 


I(G, Gk) = 


log 




dG(x) 

dGk(x) 


dG(x ) < a 


and 


h-G k = 


ce5R 


xdGk(x ) > k. 


(24) 


(25) 


Theorem [ 4 ] below is the main result of this section. Let C denote a collection of probability 
distributions on the real line with finite mean, that are not bounded from above and that 
for any a, k > 0, and a G E £, include a Gt 6 £ satisfying (24) and (25). One example of 
such an £ is a collection of all distributions whose moment generating function is finite in 
an open neighbourhood of zero and that are not bounded from above (or from below, see 
Remark [6]) . 


Theorem 4 Under V asymp (e , 5) operating on C, 


y ■ , E a T 2 {6) 

Inn mf -——— 
<5-?-o log(l/5) 


= 00 . 


The proof of Theorem [4] follows from the Lemmas [l] and [2j 


(26) 


Remark 6 Above, we assumed that £ contains distributions that are not bounded above. 
The analysis is essentially unchanged, if we had instead assumed that it contains distributions 
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that are not bounded from below. One way to see this is to note that finding the population 
with the lower mean is equivalent to finding the one with the higher mean in this two 
population setting. This is true even for d > 2 populations with minor adjustments in the 
analysis. Then, if the random variables of each population are unbounded from below, one 
may consider negative of these random variables which are then unbounded from above. 


5.2 More general setting 


ft is easy to extend this analysis in a variety of ways. For example, suppose our interest, 
given two populations, is to identify the one with the smallest p th quantile. Recall that for 
any real valued random variable Z with distribution function Fz(-), the quantile function is 
the inverse of Fz(-), and in particular, p th quantile is given by 


F. 


z 1 (p) = in f {Fz(x) >p}. 


The analysis easily generalises to handle such cases. 

We consider random elements taking values in a general state space X. Let h denote a 
mapping from a probability distribution on X to the real line. The p th quantile is one 
example of such a mapping (when X = dt) from a probability distribution of a real-valued 
random variable to the real line. Let TL denote a collection of probability distributions on 
X such that h{G) < oo for all G G 'H. Suppose that in comparing two distributions F and 
G &H, our aim is to identify min (h(F),h(G)). 

Now define a policy V(e,5) operating on distributions in F with the property that for 
F,G € H such that | h(F) — h(G) \ > e > 0 the policy V(e, 5), generates samples from the two 
distributions, adaptively deciding which distribution to sample from next and at some stage 
selects one of the two populations as the one with the lower h value. The policy guarantees 
that P(FS) < 5. As before, V aS ymp(e, £) guarantees this asymptotically. 

In the discussion at the beginning of Section 5.1 where the two populations (X t : i > 1) 
and (Y t :i> 1) are compared under the two probability measures P a and P&, replace PfiHg 
and Hq with h(F), h(G ) and h(G). Then, as in Lemma [lj it follows that under V aS ymp{ e, S), 

1 


limtaf 


... sup _ . 

^0 MfA) GGHMG)>h(F)+e,G~G 3 1(G, G) 

Remark 7 In the case where X = 3?, h(F) denotes the p th quantile of F, it easy to see 
that RHS on (27) can be infinite. We show this when TL includes mixture of Gaussian 
distributions. Consider p < 1/2. 

For p, > 0, and 0 < e < p, consider the pdfs 

g( x ) = \f J (pex p(—x 2 /2) + (1 - p) exp(—(x - p) 2 /2)) , 


(27) 
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and 


9 e{x) = \j J (Qo - e) exp(— rr 2 /2) + (1 - p - e) exp(-(x - /i) 2 /2)) . 

Note that the p th quantile of g, rn g < p/2. The p th quantile of g e , m 9e > m(e ) where 
m(e) solves 


p + e 



]_ /-m(e) 


= W — / exp(—(x — gy /2)dx = — 


2vr 





2vr 


exp(— x 2 /2)dx. 


Using the bounds available for Gaussian tail probabilities, it can be seen that 

g-m(e) ~ 2y/\og(p/e) 


as e —> 0. In particular it follows that for small e, m ge — m g can be made arbitrarily large 
by increasing p. 

Now consider, the Kullback-Leibler distance 

r log m) g(x)dx. 

7-00 \9e{x)J 

This easily seen to be bounded from above by log(p/(p — e)) and can be made arbitrarily 
close to zero by choosing e sufficiently close to zero. The case p > 1/2 is easily handled by 
considering p < 0 in definition of g and g £ . 


6 Positive results - the non-adaptive algorithms 


In this section, we show that under conditions on moments of strictly increasing non-negative 
convex functions of the underlying populations, for any given e, S > 0, one can develop non- 
adaptive V(e,S) algorithms that require a deterministic and known 0(log(l/5)) computa¬ 
tional effort for any instance of underlying populations. These algorithms rely on truncating 
random samples generated and carefully bounding the truncation error using explicitly avail¬ 
able moment bounds. 

Specifically, we suppose (as in the Introduction) that we can sample independently from d 
different random variables (X(i) : i < d). Further, from each population i, we can generate 
independent identically distributed samples (X(i,j) : j > 1). Recall that the the distribution 
of ( X{i ) : i < d) is unknown and our aim is to find 


= arg min EX(j). 


First in Section 6.0.1, we assume that X(i) 6 [0 ,b] a - s f° r each i < d for some b > 0 and 
review the V(e,6) algorithms that require 0(log(l/<5)) computational effort in that simple 
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setting. Analysis is straightforward and is discussed (independently, it appears), e.g., in 
Even-Dar et. al. (2002, 2006), Glynn and Juneja (2004, 2011). In Section 6.1, we arrive 


at the maximum expected error that may result through either appropriately truncating 
or capping a random variable when an upper bound on the moment of strictly increasing, 
non-negative convex function of a random variable is explicitly known. In Section |6.1.1[ we 


develop V(e,S) algorithms that require 0(log(l/h)) computational effort, when appropriate 
moment upper bounds are available for each population. Such bounds can often be found 
in simulation models by the use of Lyapunov function based techniques. 


6.0.1 V(e,8) policy for bounded random variables 

For e,b > 0, consider X e {b) = {(X(i) : i < d) : EX(i*) < EX(j) - e Vj ^ i*,X(i) e 
[0,6] Vi}. 

A reasonable algorithm on X £ (b) for a well chosen n (discussed later) is: 

• Generate independent samples (X(i, j) : i — 1,..., d and j = 1 ,,n). 

• Let X(i) = £52j=iX(i,j). Declare 

i = arg min X(i) 

1 <i<d 

as the best design. 

Recall that false selection occurs if i ^ i*, with probability 

P(X(i*) > minX(j)). 

This is bounded from above by 

£ p(x(n > xv)) 

Using Hoeffding’s inequality, we have 

P(X(z*) > X{j)) < P{X{P) - XU) - {EX{i*) - EXU)) > e 

Thus, n = log((d — 1 )/8) provides the desired V{e, 8) policy for any set of populations in 

Xe{b). 


< exp | 
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6.1 Bounding the truncation error 

Suppose that X is a class of non-negative random variables and / is a strictly increasing 
non-negative convex function. Examples include f(x) = x a for x > 0 and a > 1, and 
f(x) = exp(6*x) for 6 > 0. We discuss two formulations to bound errors resulting from 1) 
truncating a random variable, and 2) capping it. 

Consider the optimization problem O x 


lnaxxev EXI(X > u ) (28) 

such that Ef(X ) < c, (29) 

for some positive u and c. Also consider O 2 where the objective function is instead set to 


ma xE(X — u)I(X > u), 


again under constraint (29). 

Furthermore, since, by Jensen’s inequality, 


(30) 


Ef(X) > f(EX ) > /(0), 


we assume that c > /(0). 

Oi denotes the worst case expected truncation error under constraint (29) when X is 
replaced by the truncated XI(X < u ). 0 2 denotes the smaller error when X is replaced by 
the capped min(AT, u). 

First observe that given any random variable X that satisfies (29), a two-valued random 
variable Y that takes value 


• E[X\X < u\ (E[X\X < -u]) with probability P(X < u ) ( P(X < u)), and 

• value E[X\X > u] (E[X\X > w]) with probability P( X > u ) (P(X > u )), 

has the same mean EY = EX and same objective function values under Oi (0 2 ) , he., 
EYI(Y >u) = EXI(X > u), ( E(Y - u)I(Y > u) = E{X - u)I{X > u )). 


Furthermore, Ef{Y) < Ef(X), with equality only if A" = Y a.s. Thus, only random 
variables that take at most two values can solve our optimization problems Oi and 0 2 . It 
is also easy to see that at the optimal solution in both the cases, the constraint (29) has to 
be tight. Hence, we restrict our search to random variables taking values 0 < x\ < x 2 with 
probability 1 — p and p G [0,1] where 


(1 ~p)f(xi) +pf(x 2 ) = c, 
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so that 0 < xi < f x (c) and x 2 > f 1 (c). Furthermore, 

= c-f(x i) 

P f(x 2 )-f(x i)' 

Propositions [ 3 ] and [fjbelow observe that unique (a.s.) solutions X\ and X 2 , respectively, to 
the optimization problems Oi and O 2 are either degenerate and equal / _1 (c) with probability 
1, or they take value zero with positive probability. 

Proposition 3 The unique optimal solution X± for Ol, 

1. equals /” 1 (c) with probability 1, for u < / _1 (c). 

2. For u > /^ 1 (c) , Xi has a two-value distribution. It equals u with probability 

c-m 

/(«)-/(«)’ 

and zero otherwise. 

The following assumption considerably eases the analysis of 02. 


Assumption 4 Given any u > 0, there exists x u > u, the unique solution to 


x — u = 


f(x) ~ /(0) 
f'( x ) 


The above assumption can be seen to hold , e.g., for f(x) = x a , a > 1 and for f(x) = 
exp(6 l a;), 9 > 0. 

Proposition 4 The unique optimal solution X 2 for 02 under Assumption [^[ 

1. a.s. equals f~ l (c) for x u < f~ l (c). 

2. For x u > / -1 (c), X 2 has a two-valued distribution. It equals x u with probability 

c~f{ °) 

f{x u ) - /(0)’ 

and zero otherwise. 


Remark 8 Suppose that f(x) = x a , a > 1. Under Oi, the solution corresponds to Ad = 
c l X with probability 1 for u < c 1 /“ and the associated objective function value equals c 1//a . 
Otherwise, X\ takes two values 0 and u where the probability of the latter equals cu~ a . The 
objective function value then equals 
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Under 0 2 , for u > 0 



Then, X 2 = c l ^ a with probability 1 for u < c 1/,Q and the associated objective function 

value equals c l ^ a — u. Otherwise, X 2 takes two values 0 and x u where the probability of the 
latter equals 


cu 



The optimal objective function value equals 


cu 


(«-!) 




(31) 


Thus the worst case expected truncation error reduces by factor j if we usc m in(X, u ) 

instead of XI(X < u ) to get random variables bounded from above by u. 

Also note that under O x if f(x ) = exp(6fr) for some 6 > 0, then X 1 = log c/d for 
u < log c/9 with probability 1, and X\ takes two values 0 and u where the probability of the 
latter equals 

c — 1 

exp(du) — 1 

The objective function value in this case equals 


Remark 9 In O x and 0 2 , if we replace the constraint (29) with Ef(X ) < c with 


Efi(X) < a 


for i — 1,,d, where each f r is a convex function, and each c % is a constant, then the previous 
analysis indicates that the solution search may without loss of generality be restricted to two¬ 
valued random variables. This remains true even if X is real valued but no longer restricted 
to be non-negative. 


6.1.1 Analysis for unbounded random variables 

We now return to the problem of finding "P(e, 5) algorithms that require a deterministic and 
known 0(log(l/5)) computational effort for any instance of underlying populations, when 
upper bounds on the moments of strictly increasing non-negative convex functions of the 
underlying random variables are explicitly known. We focus on using capping to bound the 
random variables, because this has smaller error compared to truncating. 

Specifically, we make the following assumption: 
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Assumption 5 There exists a non-increasing function function R so that 


E(X(i ) - R{x))I(X(i) > R(x)) < x, 


for all x > 0. 

Remark 10 To see sufficient conditions for this assumption to hold, suppose that we are 
given strictly convex, increasing and non- negative, twice differentiable functions (/* : i < d) 
and positive constants (c* : i < d ) such that 

Efi{X(i)) < a 


for i < d. 

Then, 

where 

and x i)U uniquely satisfies 


E{X{i) - u)I(X(i ) > u) < hi(u ) 
Ci - fi(0 ) 


hi(u) = (x i)U - u) 




fi(Xi,u) ~ fi (°)’ 

fi{Xj,u) - fi( 0) 


Differentiating both sides with respect to u, it follows that 

fi(Xi , u ) - fi( 0) 


fl(Xi,uY 


-fi{xiM tU = i. 


In particular, x\ u > 0. Thus, 


i , ^ Ci-fi( 0 ) 

h ' (u) = TXX 


is a strictly decreasing function of u. Let 

n(x) = hf 1 (x). 

Then, r^x) is a strictly decreasing function of x. In particular, 


E{X(i) - ri(x))I(X(i) > r»(x)) < x, 


for all x > 0. Observe that 

R(x) = ma xrAx) 

v ' i<d v 7 

satisfies Assumption |5j 
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The proposed V(e,S) algorithm that for (3 G (0,1) requires at most 

2 R(/3e) 2 


np = 


e 2 (l — /3) 2 


log((d- l)/S) 


samples is: 


Generate independent samples : i — 1,..., d and j — 1,, rip). Let Y(i,j) = 

min (X(i,j),R(/3e)) for all i,j. 


Let Y{i) = X J2j=i Y(i,j). Declare 


i = arg min Y(i) 

l<i<d 


as the best design. 

The probability of false selection corresponds to 


P(Y(i*) > min Y(j)). 


(32) 


Repeating the analysis in Section 6.0.1 keeping in mind that for j j- i*, 

EY(i*) - EY(j) = Emin(X(i*),R(pe)) - E R(pe)) 

< EX(n-(EX(j)-Pe) 


we conclude that (32) is bounded from above by 8. 


R(Pe ) 2 


Remark 11 The f3 that minimises np corresponds to that minimising 

We solve this when fi(x) = x a , a > 1 for all i. Then, it can be seen from Remark [8] that 


'Cj\ 1 /( 0 ‘~ 1 ) ( a — 1 

TAX) — I — 


X 


a/(a— 1) 


a 


and we can set 


R(x) = 


a — 1 


max Cj 


^VG-l) \ v Q'«/(a- 1 ) J i<d 

Then minimising np corresponds to maximising 




/0 2 /(«-i)( l-p) 2 , 

and is achieved at /3 = 1/a. Thus the bound on the number of samples needed equals n\j c 
or 


2 R((3e) 


2 l0g(( rf- 1)/ 'S ) = 2 ( £ , 


( maxj<d q \ 2 /(“- 1 ) 


log((d-!)/<$). 
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7 Sequential pure exploration algorithm 


We now review some of the related literature that comes under the broad topic of stochastic 
multi-arm bandit methodology. We discuss the elegant sequential sampling strategy referred 
to as the successive elimination algorithm proposed by Even-Dar et. al. (2002, 2006). Al¬ 
though they also propose a slightly more effective median elimination algorithm in that 
paper, and there have been significant developments on the ordinal optimisation problem 
since then (see, e.g., Audibert and Bubeck 2010, Jamieson et. al. 2013), the sequential 
algorithm of Even-Dar et. al. (2006) is particularly simple and lends to an easier laconic dis¬ 
cussion. They considered the setting where the underlying random variables were Bernoulli, 
while we allow generally distributed random variables when explicit bounds on the moments 
are available. As mentioned in the introduction, we use one of the proposed methods from 
Bubeck et. al. (2013) for this purpose that relies on careful truncation of underlying random 
variables. Their analysis focusses on the regret minimisation objective but is easily adapted 
to our pure exploration setting. We present a minor tweak - while they considered trunca¬ 
tions of the form XI(X < u ) to bound rv X, we note the minor performance benefits from 
using instead the capped random variable min(X, u). These may also extend to the regret 
minimisation objective. In addition, we compute bounds on the expected number of samples 
generated under the pure exploration algorithm for general random variables. 

7.1 Sequential algorithm 

We refer to populations as arms in this section. Also, instead of finding an arm with minimum 
expected cost, in consonance with the literature, we focus of finding the one that maximises 
expected reward. Thus, we have d arms. Arm i when pulled gives a reward distributed as 
X(i) and our aim is to find the best arm 

i* = max EX (i). 

i<d 

Let the maximum above be achieved by a unique arm and let A= EX{i*) — EX{i) > 0 
for all i 7 ^ i*. 

Suppose that there exists a non-negative function a m with the property that for i < d, 

cd 

P(\X m (i) - EX(i) | > a m ) < — (33) 

where X m (i) denotes the average of m i.i.d. samples of X T(f), and 
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In the examples that we consider later a m is seen to be decreasing for all m > e. In our 
discussions, we will ignore this minor issue and assume that a m is a decreasing, non-negative 
function of m. 


Equation (33) ensures that 


where 


P{E S ) >1-5, 


E ,5 = {| x m (i) - EX(i)\ < a m , Vm, Vi < d}, 
and is the rationale for the successive elimination algorithm outlined below. 

Successive elimination algorithm 


1. Set m = 1 and S = {1, 2 ..., d}. 

2. Set for each arm i, X\ (i) = 0; 

3. Repeat 

• Sample every arm i G S once and let X m (i) be the average reward of arm i by 
trials or pulls m\ 

• Let X rn (max) = rnax ieS X m (f); 

• For each arm i G S such that X m (max) — X m (i) > 2a m do 

• set S = S — {i}; 

• end 

• m — m + 1 ; 

Until \S\ > 1; 


It is easy to see that on the set E$ the best arm is never eliminated and that all other 
arms are eventually eliminated (see Even-Dar et. ah 2002, 2006). Also, It can be easily 
checked using Hoeffding’s inequality that a m = b J A log (^|-) works for random variables 


X(i) — EX(i) G [—6,+6], i < d in (33). 


7.1.1 Explicit moment bounds on random variables 

The key step above was simply to arrive at the set Ei 

= {| X m (i) - EX{i) | < a m for all m}, 
that has probability at least 1 — S/d. 
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Bubeck et. al. (2013) achieve this when each X(i) has an explicit bound on its moment. 
They focus on the on the more interesting case where the bound is on a moment for a G (1, 2]. 
In this version we also restrict our discussion to this case. The key to their analysis is 
Lemma [3] below that relies on using truncation and Bernstein inequality (shown with the 
proof of Lemma [3]). In Lemma [3j we also state the result when the random variables are 
capped instead of truncated. 


Lemma 3 Let 5 G (0,1), a G (1, 2], K > 0, 

p(ce) = (1 + ex) + [pj~2 + 1/3), 

p(a) = — -^-(1 + a) + (y/2 + 1/3), 

a a 

and (X t : i < n) be iid samples of rv X. Suppose that E\X\ a < K and let 


_ f Km 

m Ug^- 1 ) 


Then, with probability at least 1 — 5, 


X n < EX + p(a)K 1/o 


log(h 


-i' 


n 


where 


X n — — X m I (| X m | < B m ) 

n ^—' 


m=l 


denotes the empirical mean of truncated samples. 

And, 

l°g(h 1 ) 


X n < EX + p(a)K 1/o 


n 


where 


n ^ min (| X m |, B m ), 

n t—* 


m= 1 


denotes the capped empirical mean. 


The reverse of (34) 


X n > EX - p(a)K {1+e) 


-i / log(<5 


-i - 


(34) 


(35) 


\ n 

also holds with probability at least 1 — 5 following essentially identical proof. Similarly, the 


reverse of (35). 
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7.2 Expected number of samples generated 

Lemma below is useful to our analysis. 

Lemma 4 Suppose that a > e, b > 1 and t — t* > 1 solves 

a + blogt = t. 


Then, 


2 b 2 

t* < a + b log a H-log(a + b). 


To compute ET[i) the expected number of times arm i ^ i* i s pulled, note that 


ET(i) = Y P{X» 


max 


Xm,(j' S ) ^ 2o m ) ^ E Pl - Z " 


(i*) - X m (i) < 2 a m ). 


m =1 m= 1 

For i ^ i*. let 

t* = inf {m : 4a m < A, : } 

(recall that A* = EX(i*) — EX{i)). Then, 

OO 

ET(i) < t*+ Y P{X m (il-EX(E)-{X m (i)^EX(i))<-2a m ), 

m=r *+1 

OO 

< r*+ Y (P(Xm(i*) <EX(i*) ~ Ora)+ P(X m (i)>EX(i) + a m )) 

m=T* +1 

< r* + 28/d. 

Thus, the total expected number of samples generated for all arms % ^ i* is bounded from 
above by 

Y^ Ti + 

i^i* 

and the total number of samples generated is bounded from above by twice this amount. 


7.2.1 Bounded random variables 

In particular, when X(i) — EX(i) e [—b, +b\, a m = b\J log f- ) works. Let m* be the 
solution to 


4 b 


log 



A*. 


Then, r* < m* + 1. 



Hence, using Lemma [4j 


* 32 b 2 

T < - 

1 ~ A 2 




+ 


645 2 , /32 b 2 


A 2 


' og ( v ^f' og lrf 


+ 1 


plus terms that become small as 6 decreases to zero. 

Recall that the total number of samples generated is bounded from above by 


2 ^+ 46 . 

i^=i* 


Hence the dominant terms in the upper bound for total number of expected samples for 
small 6 are 


64b 2 log 



7.2.2 Explicit bound on moments 

When E\X(i)\ a < K for all i < d, it is easily seen that 

'2mV\\ (Q " 1)/Q 


a m = p(a)K 1/c 


log 


cS 


rn 


satisfies (33) 


Again, the number of times all the arms i ^ i* are pulled is bounded from above by 

T. t " +2S 

where now r* is bounded from above by m* + 1 and rn* solves the equation 

(a—l)/a 

/ log I ^ 1 \ 

4p(a)K 1//c 


= A,-. 


m 


From Lemma |4| it follows that 


2 b 2 


m* < a + b log a H-log (a + b ) 

a 


where 


and 


a = 


4p(a)K 1 ' 


b = 2 


aNo - 1 (2d\ 
A. ) ‘ 0g U ) 

4p(a)K 1//a \ a 

Ai 
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Hence, the dominant terms in the upper bound for total number of expected samples for 
small 5 are 


2 (4p(a)/C/“) log (|) £ 



8 Appendix: Proofs 

Proof of Theorem [lj Consider P(J m ( 0) > a) — P(— inf# A(8) > a). This is bounded from 
below by 

supP(A m (6>) < -a) 

0e5R 

Now 

1 m 

Pfim(0) < -a) = P(- Vex V (8X i ) < e~ a ). 

m z —' 

i=l 

Then, from Cramer’s Theorem (see Dembo and Zeitouni 1998, Corollary 2.2.19), 

1 i m 

lim — logP(— Vexp^XA < e ~ a ) = -l e (e~ a ), 

m->oc m m 


so that 

lim inf — log P(/ m (0) > a) > — inf X e (e~ a ). 

m—too m 9 &ft 

To prove the upper bound 

lim sup — logP(/ m (0) > a) < — inf X g (e~ a ), 
m—>oo m ee5R 

in light of Lemma [5] below, we need to show that for any 9\ < 0 < 82 , 

lim sup — logP( inf A m (9) < — a) < — inf l e (e~ a ) (36) 

m—>oc m 0£[0 i, 02] 0e[0i,02] 

We show the above for 9i = 0. The case where 61 < 0 and 60 = 0 follows analogously. 
First observe that for 9 1 > 0, 

{ inf A m (0) < -a} C {A' m (0) < -a/0i}. 

0e[o,0i] 

Recalling that A( 7l (0) = A Y^Ji =1 and Assumption |TJ it follows that given any constant K, 

there exists 8 \ such that 


lim sup —log P( inf A m (9)<—a)<—K. 

m —>00 TTl 0E[O,6*i] 
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It therefore suffices to show (|36|) for 0 < 9 X < 0 2 . To this end, define 9 in = + — for 


i — 0,1,..., n, where 5 = 9 2 — 9\. Let 


-Ai,n 


for i 0,1,..., n — 1. Then, 


n— 1 


P( inf A m {9) < —a) < P( inf A m (9) < —a). 

Ve[0i,02] ^ 0eA iin 

i=0 


Observe by Jensen’s inequality that for 9 e A i n , 


Thus, if A m (9i n ) < 0, since 


A m (9) > —A m (9 it n). 

”i,n 

^ Oi+5/n 
~ 0 ~~ - 6 1 ’ 

inf A m (9) > ^ A m {9i t n). 


Hence, 


P{ inf A m {0) < —a) < P ( A m (9^) < 


e&Ai 


a9 1 


9 1 + 5/n ) 


Using Chernoff’s bound, the RHS in turn is bounded from above by 


We then have 


exp — m inf Tg ( e e i+ s /« 
0 e[ 0 i,e 2 ] 


P( inf A m {9) < —a) < nexp ( — m inf Tg (e ®i +i/ 


0 e[ 0 i ,e 2 ] 


0 e[ 0 i , 02 ] 


so that 


] ~ / _ afl i 

limsup — logP( inf A m (9) <—a) < — inf lg(e 9 i+ s / n 

rn=> oo m 0e[0i,0 2 ] 0€[0 i,0 2 ] v 


Observing that the derivative ^ Pg(x ) is continuous both in 9 and x, it follows that 

/ aOi \ 

inf Ig(e~ ¥ L+P") —> inf Ig (e~ a ) 

0e [ 01 , 02 ] V J 6»e[e»i,6» 2 ] v 7 

as n —y oo. This concludes the proof of Q. □. 


Lemma 5 There exist 9\ < 0 < $2 such that 

limsup — logP( inf A m (9) <—a) <—k 

771—^00 TYl 6^(62 , 00 ) 

and 

limsup — logP( inf A m {9) <—a) <—k, 

m —>00 m 6 £(— 00 , 61 ) 

where k > i\\ig^Tg{e~ a ). 


(37) 

(38) 
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Proof of Lemma [5j To see (37), observe that 


Recall that 


{, jnf Am(0) < ~a} C ({Am(0 2 ) < -a} U {A ' m (0 2 ) < 0} 

ee(e 2 ,oo) 


' A 1 A"* exp(0X t ) 

A 'm(0i) = m T 


Vn, TZ 1 exp(0A^) ' ^ 

Now due to strict convexity of A (9), A ( 6 ) and A \ 6 ) are positive and increase with 6 for 
all 9 sufficiently large (they may become infinite). Thus, one can find 62 > 0 so that 

1 


and 


limsup — logP(A m (0 2 ) < —a) < —k 

m—^00 TJ1 ' 


limsup — logP(A , m ( 6 ) 2 ) < 0 ) < — k 

m —^00 TTl 


(40) 


To see (40), observe that P(A( 11 ( 6 , 2 ) < 0 ) equals 

m 

P(^2 x iexp{e 2 X i ) < 0 ) 


2—1 


and the function xexp(0 2 (c) is bounded from below for 0 2 > 0. Equation (38) follows similarly. 

□ 

Proof of Proposition [if 

Consider 0 < 9± < 9 a < 9 a < 0 2 such that 61,62 £ T>° K . Recall that 

P(Im{ 0) < a) = P(inf k m {6) > -a) 
and the RHS is greater than or equal to 

P( inf Amid ) > —a) — P( inf A m (9) < —a) — P( inf A m (9 ) < —a). 

0e[0i,02] ee(-oo,0i] 0e[e 2 ,oo) 

From Corollary [lj it is clear that the last two probability terms in RHS above are bounded 
above by an exponentially decaying term in m as m —>• 00 . It thus suffices to show that 


liminf — logP( inf A m ( 6 ) > —a) > 0 

m—too m 0e[0i,0 2 ] 


(41) 


(Since any probability is bounded from above by 1, the reverse direction is trivially true.) 


To see (41), note that by Jensen’s 


Amid) > ^A m (ffi) 
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for 9 > 9\. hence, it follows that 


P{ inf k m (9)>-a)>P(k m (9 l )>-a9 l /9 2 ). 

6>e[6>i,6> 2 ] 

The RHS, in turn is bounded from below by 

P{Xi > log m/9i — 0 / 6 * 2 ) > m~ x ^ 01 exp(Aa/ 6 b). 

Thus, (|4l]) follows. 

To see 0, observe that for 9 > 0, 

P(e 0X >x)> x~ x/e 

for all x sufficiently large, so that E exp(ae 0X ) = +00 for a, 9 > 0. 

Recall that X e (e Jx ^) = 0 for all 9 e 0. For e x > e A ^, 

Io(e x ) = sup(ae x — logEexp(ae eA )) = sup(ae :E — logi?exp(ae 6,A ')) = 0 . 

a>0 

(i a above can be restricted to be non-negative for e x > e A see Dembo and Zeitouni 1998). 
In particular, for —a > -1(0) = inf 0 eR A (9), 


sup lo(e a ) = 0 . 
ee\ej a ] 


Proof of Theorem [2} 

Some notation is useful for this proof. 

Let P be another probability measure under which (X l : i > 1) remain iid, and their 
distribution is given as 


P{Xi e A) = 


E[exp(a*e 9 X )I(A)] 
E[exp(a*e e * x )\ 


Let Ep denote the associated expectation operator. Then, (10) implies that EpXie B * Xi = 0 
under Assumption [3j 

Note that we need to show that the lower bound 


holds. 

Recall that 


liminf — logP(/ m ( 0 ) < a) > —Xe*(e 

m—>-oo Vfl 


P(Im( 0) < a) = P(inf A m (0) > -a) 


( 42 ) 
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As in proof of Proposition [lj consider 0 < 9\ < 9 a < 9 a < d 2 such that 9i,9 2 G V° A . It 
then suffices to show that 


liminf — logP( inf A m (9) > — a) > — Z g *(e a ). 

m—too rn 6 >e [ 6 > 1 , 6 > 2 ] 

Let 9 = max(# 2 — 9*, 9* — 9\). Observe that for any e > 0, 

P( inf A m (9)>-a-e) > P(A m (9*) - \A' m (9*)\9 >-a - e), 
ee[ 0 i,e 2 ] 

> P(A m (9*)>-a,\A' m (9*)\<e 

1 


(43) 


(44) 

(45) 


> P\A m {9*)>-a,\-Y J X i^ Xi \<ee~ a /9y (46) 


2= 1 


Note that for any S > 0, P(A m (9*) > —a) equals 

P (A m (9*) > -a, | — ^ Xie e * Xi \ < 5 J + P ( A m {9*) > -a, |- ^ X ie 0 * Xi \ > 5 


2 = 1 


2=1 


Easy to see that (as in proof of Cramer Theorem’s upper bound), 

P [ A m (9*) > -a, | — ^2 X ie e * Xi \ > 5 J < exp (-mI r (e'“))P [ | — ^ Ahe 6, * Ai | > S j . 


2=1 


2=1 


Since EpXie e * Xi = 0, it follows that there exists as>0 such that 

p (A iC Y < e "T) > b ^ 2e_, ‘ m 

Now, from Cramer’s Theorem, 


2=1 


lim — logP(A m ( 6 »*) > -a) = -Z e *(e “). 

m—>00 m 


Thus also for any <5 > 0, 


lim -logP A ro (r)>-a,|-Vl/ x (<i =-T e *(e~ a ). 

m ^>00 m \ m ' / 


i= 1 


Hence, 


liminf — logP( inf A m (9) >—a — e) >—Z g *(e a ). 
m-too rn 0 e [ 61 , 02 ] 


Letting e \ 0, we get 


1 


liminf — logP( inf A m (9) > —a) > —Zg*(e a ). 

m—roo m 6 >e[ 6 >i, 6 > 2 ] 
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so that the result follows. 


□ 


Proof of Theorem |3} 

Note that there exists no such that for e > 0 and small, 


P{J2 Xi > °) > exp(-n/(0)(l + e)) 


2=1 


To get the lower bound, observe that for a > 1(0) and e G (0, 7(0)), 

N 


p(J2 x < > °) 

A _1 

> 

i? (exp (—77/(0) (1 + e)7(77 > n 0 )) 

(47) 

2—1 

> 

P(exp(—77/(0) (1 + e)) - P(N < n 0 ) 

(48) 

Below we show that 


— log P(N < no) = —oo. 


lim 

sup 

(49) 


Hence, 


N 


liminf — logP(V^ X, > 0) > liminf — log P(exp(—77/(0)) 

m—>oc 777, J m—>-oo 777, 

2=1 


This in turn, for b > 7(0),e(0, b — 1(0)), is 

> sup liminf — log P exp (^-/(0)) 

e m ->°° m A m (6) 

1 Co vn 

> sup liminf — logexp(— - 1 —7(0)) x P(A m (6) G (—b — e, — b + e)) 

e m-t-oo m b — e 


> 


- infX e (e- (6 - e) ). 
b - e e 


(50) 

(51) 

(52) 


Since, the above is true for arbitrarily small e > 0 and for b > 7(0), the lower bound follows. 
To see (15), for ci = C 2 = 1, observe that 


inf 

b> o 


(^- + inf l e (e- a )^j < 1 + miX e (e- m ). 

Furthermore, for 6 * such that 7(0) = —A(9*), we have Xq* (e _i l°)) = 0, so that the RHS above 
is bounded from above by 1. Strict inequality in (15) can be inferred by differentiation. 


To see (49), consider the probability P(N < n 0 ). This is dominated by 

P(L(0) > m/no) 

which in turn is bounded from above by 

P(L(0) > mo/no) 
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for all m > mo. 

The following result is then useful. 

Lemma 6 Given any K > 0, there exists an a > 0 such that 

limsup — logP(/ m (0) > a) < —Ii. 

m—>oo TTl 


Proof: Observe that 


Note that for any 6 , 


P(U 0) > a) = P( inf A(0) < -a) 

aGyt 


and for any 6_ 

Thus, 


{inf A m (0) < -a} C (A m (6») < -a} U {A ' m {6) < 0} 

9>e 


{inf A m (9) < ~a} C {A m (0) < -a} U (A' m (0) > 0} 

C7\C7 


P(inf A(0) < -a) < 3P( inf_ A(0) < -a) + P(A^(0 > 0) + P(A' m (0) < 0). 
0e5R ee[0,0] 


Selecting 9 and 6* such that 


and 


limsup — logP(A^(0) > 0) < —K 

m—^o o 


limsup — logP(A' (0) < 0) < —A' 

771—>00 TUj 


we now show that there exists a so that 


limsup — logP( inf A(0) < —a) < —K, 

m — yoo TTl [ 0 , 0 ] 


which completes the proof. To see (53), observe that LHS equals 


Furthermore, since 


inf Z e (e a ) 
e&[e,e\ 


Z e (e a ) = sup (ae a - log E exp(ae ex )) , 


by setting a = — e a , we get 


inf Zg{e~ a ) > -1- sup logP[exp(-e a+0X )] 
e>e[0,0] 


(53) 
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The RHS is bounded from below by 


-1 - log£[exp(-e a+inf W] 0X )] 


Since, 


a + ( inf_ 8 )X —» oo 
e&%e\ 


as a —> oo, it follows by bounded convergence theorem that 

logi?[exp(—e a+mf «s[«,9] 0X )] y oo 


as a —>■ oo. In particular, there exists an a sufficiently large so that (53) holds. □ 

Proof of Proposition [2} Note that 

P(FS) > P(X m > 0 and exp(—rn/ m (0)) < h) 


for m = Cilog(l/h). Thus, for (21) to hold, it suffices that 


Equivalently, 


liminf 5 L P(X m > 0 and exp(— ml m (0)) < 5) = oo. 

5 —>oo 


liminf <5 1 P(A^ l (0) > 0 and inf A m (9) < —1/ci) = oo. 

<5—>-oc 0e5R 


Hence, for (21) to hold, it suffices to find 8 < 0 such that 

lim inf 5~ 1 P(A m ( y 8) < —1/ci) = oo. 

<5—>-oo 

Since, by Cramer’s theorem, for A (8) > —1/ci, for e > 0 and m sufficiently large, 

P(A m (9) < 1/Ci) > exp(-m(l + e)X 0 (e _1/ci )) 


(21) follows if 


liminf8 1 exp(— m(l + e)X 0 (e ly/ci )) = oo. 

5—>oo 


for some e > 0. This is implied by (20). □ 

Proof of Lemma [l} Define 

t*(S) = 


-— log(l/h). 

3 1(G,G) 

Proof relies on a contradiction. Suppose that there exists a ( G (0,1/2) such that 

lim inf EaT ^P < 1 - C- 
6^0 t*(5 ) - 


( 54 ) 
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Then we show that this implies that 


limsup Pb( J(5) = 2)5 = oo, 

5—>0 


providing the desired contradiction, so that (J23l) follows. 


Note that (54) implies that, there exists q E (0,C), and a sequence 5 n —> 0 as n —)■ oo, 
such that 

E a T 2 {5 n 


sup +*fx ^ 

n t (u n ) 


< 1— Tj. 


Let L(5) denote the likelihood ratio of P& with respect to P a of the generated samples until 
time T{5)=T 1 (6) + T 2 (6). 

Then, 

P b (J{5) = 2) = E a [L(5)I(J(5) = 2)]. 

Since each X t has the same distribution under P a as well as P&, 


T2{S) rir(Y i / T2{5) 


dG(Y t ) \ 


Let Si(5) = {T 2 {5) < 2t*(5)}. Note by Markov’s inequality that 

P(Si(S n ) e ) < {1 - V )/2. 


By SLLN 


1 i dG(Yj) ~ 


n tt dG w 

as n —> oo, under P a . In particular then, 

3 


, dG(Yj) ~ 

2_, lo S TTTTN ^ J ( G ’ G), 


1 

— max 
n j<n ^ dG(Yj) 


so that 


P | max ^ log < 2 1(G, G)( 1 + n,)t*(5 n ) ) —>■ 1 


^j<2t*{5„) dG{Yi) 

for any k > 0 as n —> oo. 

■ r\ \ ^ lr»cr 

dG(Yi) 


Let S 2 (S) = {ma Xj< 2 t*(s) ELi lo S < d(G,G)(l + «)2t*(<5)}. 


Let Ss(5) = {J(5) = 2} and let 

^) = S!(5)n5 2 (5)n5 3 (5). 


(55) 


It can then be seen that P a (£{5 n )) for sufficiently large n is arbitrarily close to (1 — 77 )/2. 
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Now, 


P b (J(Sn) = 2) > P b (£(s n )) = E a [L(S n )I(£(S n ))}. 


Hence, 

P b (J(S n ) = 2) > P a (£(5 n )) exp(— 2 /(G, G)(l + «)t*(<$ n )) = P a (£(<5 n )Rf +k)/3 , 


which for n < 1/2 implies that Pb(J{5 n ) 
contradiction that implies (23). □ 


2 )5 n 1 —> oo as n —>■ oo. This is the desired 


Proof of Lemma [ 2 } Consider a large b whose value will be fixed later. Furthermore, take 
7 G ( 0 , 1 ). Construct G k as follows: Set 


G k (x) = (1 - 'f)G(x) 


for all x < b, and, 

for x > b, where, G k (x) 
Note that 


G k {x)=l3G{x) 

1 — Gk{x) and G(x) = 1 — G(x). 


P 


1 + 7 


Gib) 

G(by 


Then, 

[ l°g (-jprr\) dG ( x ) = ~G(b ) log(l - 7 ) - G(b) log p 
Jx& r \dG k [x)J 

By selecting 7 = 1 — exp(—a/2), we get 


(56) 


—Gib) log(l — 7 ) < a/ 2 . 

Furthermore, then G(b) log/3 equals 

(Using e~ x > 1 — x and log(l + x) < x above). 

Also, for b such that G{b + ) = G(b~), 

fi Gk = (I- 7 ) J xdG(x)+(l + 7 ^|^ xd,G{x) > exp(-a/ 2 )/i G +(l-exp(-a/ 2 ))G (&)6 

Since, RHS increases to infinity as b — » 00 , one can select b sufficiently large so that /x Gfc > k. 

□ 

Proof of Proposition [3| The case u < / _ 1 (c), follows by noting that 

!{EX) < Ef{X) < c, 
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so that EXI(X >u)< EX < f~\c ). 


Now consider u > / _1 (c). Suppose that X 1 takes values x\ G [0,/ _1 (c)) and x 2 > u. 
ix 2 < u is not possible as then the optimal objective function value is zero.) 

The objective function (28) at optimal value then equals 


( c ~ fi x l) \ 

\f(x* 2 )-f(xt)J- 


Note that 

c~ fjx) 
f(x 2 ) - fix) 

for x < / _1 (c) an d fi x 2 ) > c, is a strictly decreasing function of x. Thus, x\ 
Now observing that 


fiv) - m < yf'iv) 


0. 


from simple calculus, it follows that 

t c-m \ 

v \ f(y) — /( 0 ) / 

is non-increasing in y for y > u and the result follows. □ 


Proof of Proposition [4j Consider a solution that puts mass at two points x 2 and x\ G 
[0, x 2 ) with probabilities p and 1 —p, respectively. Clearly, then x\ < / _1 (c) and x 2 > /~ 1 (c). 

First suppose that an optimal solution has x\ > u with positive probability. We show 
that this leads to a contradiction. Note that x\ > u only if u < / -1 (c). In that case, the 
objective function equals 


Consider the function 


/ * c — fix j) . * . 


(57) 


ix — 


fix) - fixt) 


This is clearly non-increasing in x. Thus x 2 in (57) equals / -1 (c) with probability 1 providing 
the desired contradiction. 

Now suppose that x\ < u. Then, to achieve a positive value of the objective function, we 
need x* 2 > u. I 11 particular, the optimal objective function equals 


(Xn 


c-fjxl) \ 


(58) 


Note that 

c~ fix) 
fix 2 ) - fix) 
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for x < min(.x' 2 , / 1 (c)) and f(x 2 ) > c, is a strictly decreasing function of x. Thus, x\ = 0 
in g. 

Now consider the function 

{ {x ~ u) wxm) 

for x > u. Due to Assumption Rl this is maximised at x u . Thus, if x u > / _1 (c), then clearly 


x* 2 = x u in (58). Else, x* 2 — f x (c) with probability 1. The proposition stands proved. □ 
Proof of Lemma [Ij First note that t* < (a + b) 2 for if not, then 

(a + b ) logf* > (a + b) 2 


so that logt* > (a + b). Since, (a + b) > it follows that log 2 t* > t* providing the 

desired contradiction (since log 2 t < t for t > 1). 

Now 


t* = a + blogt* = a + 61og(a + blogt*) < a + 61og(a + 261og(a + 6)). 
and the result easily follows. □. 

The Bernstein inequality below is useful for the proof of Lemma [3j 

Lemma 7 (Bernstein Inequality) Suppose that {X t : i > 1) are iid mean zero rv and 
there exists M > 0: 

\Xi\ < M. 

Then, 

1 ( n 2 t 2 / 2 

P(-gV>i)<e x P (- EL£A? + Mnt/3 , 

Proof of Lemma [3] 


Proof for (34) below is taken from Bubeck et. al. 2013. 


We would like to show that 


EX--yX m I(\X m \<B m ) 

n 


771=1 


is less than or equal to 


p(a)K 


1/a 


iog(h 


n 


with probability at least 1 — 5. Note that (59) equals 
1 


(59) 


n 


J2(EX - E(XI(\X\ < B m )) + - J2(E(XI(\X\ < B m )) - X m I{\X m \ < B m )) 

z ' n z —^ 


777 .— 1 


771=1 
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This in turn is less than or equal to 


1 y, K 

m =1 


2Bl-«K\og{5-') + TUog^- 1 ) 


n 


3 n 


(60) 


with probability 1 — 5 where we used Bernstein inequality to bound 

1 


n 


J2(E(XI(\X\ < B m ))-X m I(\X m \ < B m )) 


m =1 


by 


2Bt a Klog(5-i) + B n log(5- 1 ) 


n 3n 

observing that EX 2 I[\X\ < B) < KB 2 ~ a in the inequality. 


Equation (34) follows by plugging in the values of B m in (60) and conducting some minor 
simplifications. 

To see (J35|) , in the above analysis, replace X m I(\X m \ < B m ) and XI(\X\ < B m ) by 


niin(X m , B m ) and min (X,B m ), respectively. Then, using (31), observe that (60) is replaced 
by 

/(n, - 1 JL K /9R2-a A'lno-iA-B R lno-iA-B 

(61) 


(a-rV y^ K | 2Bl-°K\og{8-') | B n log^" 1 ) 

oi a ) n BE~ l V n 

' m= 1 " L 

Again, the result follows after simplifications. □ 


3n 
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